Automatic subword unit refinement for spontaneous speech recognition via phone splitting

نویسندگان

  • Jon P. Nedel
  • Rita Singh
  • Richard M. Stern
چکیده

Spontaneous speech is highly variable and rarely conforms to conventional assumptions and linguistically defined pronunciation rules. Specifically, there may be many different continuous speech realizations for each expertly defined phonetic unit in the dictionary. The phones may be realized in a clean and complete fashion as in read speech, or they may be realized in a sloppy and incomplete fashion as in highly spontaneous speech. For spontaneous speech, therefore, it may be beneficial to model incompletely realized variants of any phonetic unit as separate units. In this paper we test this hypothesis by introducing two possible modeling classes for the phones AA and IY in the standard English CMU recognition dictionary. We propose three different automatic methods of segregating the training data properly in order to identify and label the appropriate variants. Each of these methods results in improved recognition performance over the baseline, leading to the conclusion that finer modeling frameworks can be helpful to parameterize properly and recognize spontaneous speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Moving beyond the ‘beads-on-a-string’ Model of Speech

The notion that a word is composed of a sequence of phone segments, sometimes referred to as ‘beads on a string’, has formed the basis of most speech recognition work for over 15 years. However, as more researchers tackle spontaneous speech recognition tasks, that view is being called into question. This paper raises problems with the phoneme as the basic subword unit in speech recognition, sug...

متن کامل

Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery

Development of state-of-the-art automatic speech recognition (ASR) systems requires acoustic resources (i.e., transcribed speech) as well as lexical resources (i.e., phonetic lexicons). It has been shown that acoustic and lexical resource constraints can be overcome by first training an acoustic model that captures acoustic-to-multilingual phone relationships on languageindependent data; and th...

متن کامل

Pronunciation Lexicon Development for Under-Resourced Languages Using Automatically Derived Subword Units: A Case Study on Scottish Gaelic

Developing a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be available, particularly for under-resourced languages. To avoid the need for the linguistic knowledge, acoustic information can be used to automatically obtain the subword units and the associated pronunciations. Towards that, the present paper investigates the potential of a rec...

متن کامل

Constrained Subword Units for Speaker Recognition

Phonetic features have been proposed to overcome performance degradation in spectral speaker recognition in difficult acoustic conditions. The harmful effect of those conditions, however, is not restricted to spectral systems but also affects the performance of the open-loop phone recognisers on which phonetic systems are based. In automatic speech recognition, larger subword units and the use ...

متن کامل

Weighting Phone Confidence Measures for Automatic Speech Recognition

One of the most useful applications of Confidence Measures (CMs) in Automatic Speech Recognition systems is early detection of incorrect recognition hypotheses. A purely acoustic basis for such a CM is particularly important when tracking errors resulting from Out of Vocabulary speech, background noise or keyword substitution. A commonly taken approach is to compute scores on subword units of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000